HAT-Trie: A Cache-Conscious Trie-Based Data Structure For Strings
نویسندگان
چکیده
Tries are the fastest tree-based data structures for managing strings in-memory, but are space-intensive. The burst-trie is almost as fast but reduces space by collapsing trie-chains into buckets. This is not however, a cache-conscious approach and can lead to poor performance on current processors. In this paper, we introduce the HAT-trie, a cache-conscious trie-based data structure that is formed by carefully combining existing components. We evaluate performance using several real-world datasets and against other highperformance data structures. We show strong improvements in both time and space; in most cases approaching that of the cache-conscious hash table. Our HAT-trie is shown to be the most efficient trie-based data structure for managing variable-length strings in-memory while maintaining sort order.
منابع مشابه
Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints
A string similarity join finds similar pairs between two collections of strings. It is an essential operation in many applications, such as data integration and cleaning, and has attracted significant attention recently. In this paper, we study string similarity joins with edit-distance constraints. Existing methods usually employ a filter-and-refine framework and have the following disadvantag...
متن کاملP-Trie Tree: A Novel Tree Structure for Storing Polysemantic Data
Trie tree, is an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings. It makes the search and update of words more efficient and is widely used in the construction of English dictionary for the storage of English vocabulary. Within the application of big data, efficiency determines the availability and usability of a system. In...
متن کاملConstruction of the CDAWG for a Trie
Trie is a tree structure to represent a set of strings. When the strings have many common prefixes, the number of nodes in the trie is much less than the total length of the strings. In this paper, we propose an algorithm for constructing the Compact Directed Acyclic Word Graph for a trie, which runs in linear time and space with respect to the number of nodes in the trie.
متن کاملEfficient Trie-Based Sorting of Large Sets of Strings
Sorting is a fundamental algorithmic task. Many generalpurpose sorting algorithms have been developed, but efficiency gains can be achieved by designing algorithms for specific kinds of data, such as strings. In previous work we have shown that our burstsort, a trie-based algorithm for sorting strings, is for large data sets more efficient than all previous algorithms for this task. In this pap...
متن کاملAlgorithms and Data Structures for Ip Lookups
Ioannidis, Ioannis. Ph.D., Purdue University, May, 2005. Algorithms and DataStructures for IP Lookups. Major Professor: Ananth Grama. The problem of optimizing access mechanisms for IP routing tables is an impor-tant and well studied one. Several techniques have been proposed for structuringand managing routing tables, with special emphasis on backbone, high-throughputrouting. T...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007